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ABSTRACT 

A method is presented for Bayesian model selection without explicitly computing evidences, 
by using a combined likelihood and introducing an integer model selection parameter n so 
that Bayes factors, or more generally posterior odds ratios, may be read off directly from the 
posterior of n. If the total number of models under consideration is specified a priori, the full 
joint parameter space (6, n) of the models is of fixed dimensionality and can be explored using 
standard Markov chain Monte Carlo (MCMC) or nested sampling methods, without the need 
for reversible jump MCMC techniques. The posterior on n is then obtained by straightforward 
marginalisation. We demonstrate the efficacy of our approach by application to several toy 
models. We then apply it to constraining the dark energy equation-of-state using a free-form 
reconstruction technique. We show that ACDM is significantly favoured over all extensions, 
including the simple iv(z)=constant model. 

Key words: methods: statistical - methods: data analysis - dark energy - equation of state 
- cosmological parameters 


1 INTRODUCTION 

Comparing two or more models given some data is central to the 
scientific method. The field of model selection within statistical 
inference attempts to address this problem, and numerous tech¬ 
niques for choosing between models exist, including: Akaike's In¬ 
formation Criterion (Akaike 1974), Schwarz’s Bayesian Informa¬ 
tion Criterion (Schwarz 1978) and the Bayesian evidence (Jeffreys 
1961; MacKay 2003). Here we focus on Bayesian model selec¬ 
tion using the evidence Z. (also known as the prior predictive or 
marginal likelihood) and posterior odds ratios Pjj (a generalisation 
of the more commonly used Bayes Factors ’B i f), as this technique 
is inherent to Bayes theorem and both are widely used throughout 
cosmology and astrophysics (Liddle et al. 2006). 

Posterior odds ratios provide a quantitative means for select¬ 
ing between models and are usually calculated directly from the 
evidence of each model. In higher dimensions, techniques to cal¬ 
culate evidences include thermodynamic integration (also known 
as simulated annealing) (Gelman & Meng 1998), approximations 
to the evidence when certain favourable conditions are met (such 
as unimodality and Gaussianity) (Tierney & B. 1986; Liddle et al. 
2006) and nested sampling (Sivia & Skilling 2006; Skilling 2004, 
2006). Calculating Bayes factors directly, without calculating Z 
for each model, is also possible using the Savage-Dickey density 
ratio for nested models (where a more complex model reduces 
to the simpler by setting its additional parameters appropriately) 
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(Verdinelli & Wasserman 1995). A good review from before nested 
sampling’s rise in popularity can be found in Clyde et al. (2007); 
for a thorough review of these methods in cosmology see Trotta 
(2008). 

In this paper we propose a method to calculate posterior odds 
ratios without the problems associated with evidence calculations 
or simplifying assumptions. Posterior odds ratios are calculated di¬ 
rectly from a set of models explored simultaneously without con¬ 
straints on the forms these models might take. The new method 
circumvents the challenges associated with accurate evidence cal¬ 
culations by computing posterior odds ratios using Bayesian pa¬ 
rameter estimation, which is typically a more reliable and com¬ 
putationally less expensive task. Additionally, parameter estima¬ 
tion algorithms are more commonly used and therefore the method 
provides an easy means for extending existing codes to the domain 
of model selection. This is achieved by introducing a parameter 
that selects between models, and allows the calculation of poste¬ 
rior odds ratios from the posterior probability of this parameter. 
We note that similar approaches have been proposed previously 
(Hobson & McLachlan 2003; Goyder & Lasenby 2004; Brewer & 
Donovan 2015), but these typically rely on the use of sampling 
techniques capable of jumping between parameter spaces of dif¬ 
ferent sizes, such as reversible jump MCMC (Green 1995), which 
requires special sampling methods that are often very computa¬ 
tionally demanding. Our approach is much simpler, requiring no 
special sampling methods, provided the number of models under 
consideration is specified a priori, and is related to the class of 
product-space MCMC methods originally proposed by Carlin & 
Chib (1995) (see also Sisson (2005); Lodewyckx et al. (201 1)). 
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We apply our method to toy models and the cosmological 
problem of constraining the dark energy equation of state, with 
particular emphasis on determining the complexity supported by 
data for deviations from ACDM. In both cases we are solving the 
problem of how many nodes are required in a piecewise linear 
model to reconstruct a one-dimensional function. With the num¬ 
ber of nodes defining the models, we show explicitly that this new 
method agrees with the evidences-based approach for calculating 
posterior odds ratios. 

The rest of the paper is organised as follows. Section 2 pro¬ 
vides a brief statistical overview of posterior odds ratios and evi¬ 
dence calculation. Section 3 discusses the statistical framework for 
calculating posteriors odds ratios using parameter estimation in¬ 
stead of calculating evidences. Thereafter, results are presented in 
Section 4 for a toy model data fitting problem and in Section 5 for 
the cosmological problem of characterising the dark energy equa¬ 
tion of state parameter as a function of redshift using recent cos¬ 
mological datasets. We summarise our findings and conclude in 
Section 6. 


2 BACKGROUND 


Bayes Theorem (Bayes & Price 1763; MacKay 2003; Sivia & 
Skilling 2006) states that, 


?r(X\YJ) = 


Pr( Y\X, 1 ) Pr(A|/) 
Pr(T|/) 


( 1 ) 


where X and Y are propositions, Pr(7Q specifies our belief that 
the proposition is true, and I is the background information. Us¬ 
ing this we can calculate the probability that a set of parameters 8 
of a model A4 takes specific values given some data D to constrain 
them (note we drop the dependence on 1 as it is implicit through¬ 
out): 


Pr(0|£>, M) 


Pr(£>|6»,A4)Pr(6>|A4) 
Pr(D | M) 


£n 

~Z' 


( 2 ) 


where £. n and _Z are shorthands for the likelihood, prior, and ev¬ 
idence respectively. This is Bayesian parameter estimation, where 
Pr(01 D, M) is the posterior probability distribution. Similarly, we 
can calculate the probability of a model given some data: 


Pr(M|2» = 


Pr(£>|M)Pr(M) 

PrCD) 


Z 7Tj\.[ 
Pr(£>)' 


( 3 ) 


Taking the ratio of the probabilities of two models signifies our 
degree of belief in one model over another. Taking the logarithm 
of this ratio and using equation (3) above gives us posterior odds 
ratios: 


Pij = In 


Pr(M, | O) 
Pr(M,|£>) 



+ In 



( 4 ) 


If 7Tmi = then Pij = S,y, the Bayes factor, which is more com¬ 
monly used in the literature despite being a less general treatment 
than the fully Bayesian posterior odds ratios that also considers the 
prior probability of each model. For both, criteria to give meaning 
to this quantification are given by the Jeffreys guideline (Jeffreys 
1961), shown in Table 1. Model selection using Bayesian statis¬ 
tics thus requires the calculation of ratios of evidences. Typically 
the evidences are first calculated separately and their ratios eval¬ 
uated. Calculating the evidence for each model is inherently diffi¬ 
cult. From equation (2) we see that Z is a normalisation constant 
for Pr (8 1D, Al), allowing us to calculate it as 


Z= f £.{0)n(e) d6. 
Jail 0 


( 5 ) 


POR 

Favouring of Mj over Al, 

0.0 « Pij 1.0 

None 

1.0 « Pij < 2.5 

Slight 

2.5 « P u < 5.0 

Significant 

5.0 « Pij 

Decisive 


Table 1 . Jeffreys guideline for interpreting PORs. As Pj[ =- Pij, negative 
PORs imply reversed model favouring. 


Equation (5) is a multi-dimensional integral over the whole param¬ 
eter space of a model. Computationally it is not possible to calcu¬ 
late these by brute force even for modest dimensionalities, and the 
techniques mentioned in the introduction have been developed as 
an alternative means to do so. The most promising of these tech¬ 
niques is nested sampling, and with steady advances made in both 
computing power and algorithms to implement nested sampling, 
many cosmological and astrophysical model selection problems 
can now be solved by computing evidences, which is the current 
standard practice. 


3 METHOD 

We propose a method here for calculating posterior odds ratios, 
using parameter estimation techniques, that avoids calculating ev¬ 
idences directly. The method places no constraints on the models 
that can be considered and has the advantage of being simple to im¬ 
plement and undisruptive for members of the community familiar 
with Bayesian parameter estimation techniques. 

Consider a number of different models AT, (n = 1 , 2 ,..., N). 
We combine these into a single hyper-model AT The parameters 
of M are the integer variable n that ‘switches' between the models 
Af„, and the union 6 of the parameter vectors 8„ of each individual 
model. Note that, if there is some overlap between the parameter 
vectors 8„ and dp of two different models, then the coincident pa¬ 
rameters are notionally included only once in the union 6. In prac¬ 
tice, the parameter n can be implemented as a continuous parame¬ 
ter and a suitable binning used to convert it to an effective integer 
parameter, thereby simplifying the implementation (provided the 
technique used to explore the parameter space does not rely on gra¬ 
dient information). Indeed, the implementation of our approach is, 
in general, straightforward, since one needs only to write a simple 
‘wrapper’ hyper-likelihood function for AT which calls the exist¬ 
ing likelihood function for the appropriate individual model AT, 
depending on the (integer) value of n. 

In general, the parameter vectors 0„ and 8p for different mod¬ 
els will be of different dimensionalities. In the case of nested mod¬ 
els, where 8 n c 8, 1+l , such problems are usually accommodated us¬ 
ing reversible-jump Markov chain Monte Carlo (RJMCMC) meth¬ 
ods, which are capable of making transitions between spaces of 
different dimensionality. In principle, such methods might also be 
used in the case of non-nested models, even in the extreme case 
where 0„ and dp have no parameters in common, although such 
applications have not been widely explored. 

Here we adopt a different approach that accommodates nested 
and non-nested models equally well, including the extreme case 
mentioned above, and avoids the algorithmic complication and 
computational expense of RJMCMC methods. The only assump¬ 
tion required is that N (the number of models under considera¬ 
tion) is known a priori. Although this seems an innocuous require¬ 
ment, it does constitute a mild limitation. Consider, for example, 
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the classic nested problem of fitting a polynomial of unknown de¬ 
gree to a set of ( x,y ) data points. In our approach, one is required 
to fix the maximum allowed degree N of the polynomial in ad¬ 
vance, whereas this is not necessary in the traditional RJMCMC 
approach. Nonetheless, in realistic applications such a limitation is 
not too severe. 

By fixing N, the full parameter space (0, n) is determined a 
priori , and is of fixed dimensionality, so it may be explored using 
standard sampling methods, such as MCMC or nested sampling 
(MacKay 2003; Skilling 2006; Brewer et al. 2011). Explicitly, sup¬ 
pose at some MCMC step or nested sampling iteration one consid¬ 
ers the point (0, n) (possibly after suitable binning of the continu¬ 
ous parameter n to obtain an integer value). For any given value 
of n so obtained, the union parameter space may be partitioned 
into those parameters 0„ on which the model M„ depends and the 
remaining parameters <t> n that are not used by M n . The ‘wrapper’ 
hyper-likelihood function thus may pass only the parameters 6„ 
to the likelihood function for the appropriate model M„. The re¬ 
maining parameters </>„ are thus ‘ignored", which is equivalent to 
assigning a constant likelihood value over this subspace. By con¬ 
sidering the full space (0, n ), however, the sampling method will 
typically need to accommodate moderate to large dimensionality, 
most likely possessing multiple modes and/or pronounced degen¬ 
eracies. In practice, nested sampling is well suited to such prob¬ 
lems, and therefore we adopt it here. 

Once one has obtained a set of posterior samples from the 
space (0, n), one may calculate Pr(/?|2), M) by simply marginal¬ 
ising out all other parameters to produce a marginalised posterior 
probability: 


-/ 


Pr(;i| £>, Al) = Pr(0, n\ D, M) d 0 




= I £(e,n)7t(0,n)d0, 

Z-A 


(6) 


(7) 


where Zm is the evidence for this hyper-model AT Since for 
any given value of n the union parameter space may be par¬ 
titioned into those parameters 0„ on which the model M„ de¬ 
pends and the remaining parameters <p n that are not used by M„, 
one may write the likelihood in (7) as _£(0„) and the priors as 
7f(0 |rc)=7r(0„ \n)n(<f> n )7r(n), where n(n) = Pr(«|Af). Hence (7) be- 


Pr(n| £>, Al) = ^ 
Zm 


i 


£(e„)7T(d n \n)de,„ 


( 8 ) 


where we have used the fact that the integral over the priors for 
unused parameters is unity, namely J d(/>„ 7r(</>„) = 1. We recognise 
the integral in (8) as the evidence Z„ of the model Af„, so that we 
have 


n(n) Z n = Zm Pr(n| £>, M). 


(9) 


We are interested in the posterior odds ratios between two models, 
Mi and Al,: 


Pij = In 


Pr(rc= j\ D,M) 
Pr(n=i\D, M) 


( 10 ) 


where the Zm cancels. Thus, the posterior odds ratio is given sim¬ 
ply by the ratio of values of the posterior Pr(/7| D, M) for the two 
models, which is obtained using the parameter estimation formula¬ 
tion of Bayes theorem and the process of marginalisation, without 
the need to calculate evidences directly. The key feature is that 
the unused parameters (f> n marginalise out to unity. Moreover, the 
posteriors on i j> n should simply equal the priors on <j) n . Visual in¬ 
spection of these posteriors thus provides a useful check that the 
method is performing correctly. 


A potential downside to this method is the requirement that 
the prior probabilities of the models are specified in advance. For 
signal detection problems with an unknown number of sources, for 
example Hobson & McLachlan (2003); Feroz & Skilling (2013), 
this is in principle undesirable but in practice a suitable prior choice 
can always be found. Additionally, if calculating posterior odds ra¬ 
tios for another model Mn+\ was desired, after having completed 
the analysis for the first N models, then a repetition of the method 
with only this new model and the most favourable model is pos¬ 
sible, at a computational cost of exploring the most favourable 
model 1 a second time. 

It is also important to note, however, that our new method 
does not produce an estimate of the error on the posterior odds 
ratios in a single computation, whereas this is possible when cal¬ 
culating evidences directly using nested sampling. Throughout we 
therefore use multiple repeat runs to obtain an error on the poste¬ 
rior odds ratios. 


4 APPLICATION TO TOY-MODELS 

In this section we demonstrate our approach by applying it to some 
toy-models and in the next section we apply our method to con¬ 
straining the dark energy equation-of-state as a function of redshift 
using recent cosmological datasets. 

In both applications we seek to model a one-dimensional 
function y(x) using a piecewise linear interpolation scheme be¬ 
tween a set of nodes and ask the model selection question “how 
many nodes are needed to fit the data?”. Thus we place a set of 
nodes y,(x,) in the plane, where the amplitude y, and the position X; 
are model parameters to be varied. At x m i n and x max fixed-position 
nodes are placed with varying amplitude only, such that for the 
model defined by n internal nodes there are 2 + 2 n parameters. As 
shown in Figure 1, linear interpolation is used to construct y at 
all points (with v(x) set constant outside the range [Xnu n , x max ]). Of 
course, other interpolation schemes between nodes may be used, 
such as splines, although we do not consider these here. The ap¬ 
plication of these approaches to constraining w{z ) is described 
by Vazquez et al. (2012b). 

A specific model is defined by how many nodes are used in 
reconstructing y(x). Comparing multiple models with increasing 
numbers of nodes identifies how many nodes are needed to fit the 
data, in other words the preferred complexity inherent in the data. 
As the final result, one can plot either Pr(y|x, n*), where n* de¬ 
noted the number of nodes in the most favoured model, or Pr(y|x) 
averaged over all models weighted by their posterior odds ratios 
(PORs) (Parkinson & Liddle 2013; Planck Collaboration et al. 
2015). Either approach identifies clearly the nature of the data con¬ 
straints on v(x). 

The key strength of the reconstruction is its free-form na¬ 
ture, which can capture any shape of function in the v(x) plane by 
adding arbitrarily large numbers of nodes. Providing the model se¬ 
lection criterion penalises over-complex models appropriately by 
weighing ‘goodness-of-fit’ against the numbers of parameters in 
the model (Occam's Razor), identifying how much complexity the 
data support is performed in a clear and unambiguous manner by 
the favoured number of nodes. Model selection techniques can thus 
be used to solve questions on the constraining power of the data, as 


1 The most favourable is best used, in light of discussions on the size of 
error bars in section 4.2. 
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X 

Figure 1. Illustration of the nodal reconstruction, which flexibly allows the parameter estimation process to define the preferred shape of y(x) from the data 
by linearly interpolating nodes whose amplitudes, positions (for internal nodes) and number can vary as required. The figure shows the interpolation process, 
and highlights how nodes can be positioned inside the unshaded prior space (with sorting of node positions such that Xj < Xi+ 1 ). 



(a) sin(27rjt): 47 points sampled from the sin(2/rA') function. 



O 0.2 0.4 0.6 0.8 1 

(b) line(2 ttx): 49 points sampled from the ]ine('2/rx) function. 


Figure 2. Data points plotted in the (x,y) plane for each dataset (a) and (b). The unshaded region represents the prior space for the y, amplitudes and x, 
positions of the nodes, over which a uniform prior is assumed (with sorting of the node position parameters such that x t < x,+i). 


successfully shown in various cosmological applications (Vazquez 
et al. 2012a,b; Planck Collaboration et al. 2015). 

The nodal reconstructions are clearly nested models. Since 
our general approach does not require this, for completeness we 
also consider a non-nested model selection problem by comparing 
a 2-internal node reconstruction with a sinusoidal model. The rest 
of this section presents the results obtained and highlights further 
strengths and weaknesses of our approach. 


4.1 Fitting a function to data 

Consider a set of j max data points {( Xj,yj ), j= 1, • • ■ , ./max) with ex¬ 
perimental errors {(cr x ., ay)} on each of the points. Assuming there 
is a functional relationship between the independent variable x and 
dependent variable y, captured by y=/( x), then the likelihood of 
observing these data is given by: 


mx j ,yj}\{cr Ij ,cr yj },f,X_,X + ) = 



(xr x j) 2 _ ( yj-nxj)f 

2 < 7 ^ . 2 CTy 


2,T(T, <r, .(.Y. - X-) 


(ID 


functions described above, and use posterior odds ratios to deter¬ 
mine how many nodes optimally reconstruct the function. 

We test 2 different datasets, shown in Figure 2. The traditional 
evidence-based approach and our new method for calculating pos¬ 
terior odds ratios are compared for each dataset. The constraints on 
y(x) given the data are also discussed. 

Dataset (a) has 47 datapoints drawn uniformly in x from the 
function y= sin(27tjc) in the range x e [0,1], with each point ad¬ 
justed in x and y by random Gaussian noise with mean=0 and 
cr=0.05 (error bars on datapoints are cr) 2 . Dataset (b) has 49 dat¬ 
apoints drawn as in (a) but from a piecewise-linear function coin¬ 
ciding with the function y= sin(2;rx) at x=0, 0.25, 0.75, 1, so that 
it is very difficult by eye to distinguish the two datasets as being 
drawn from different functions. We call the function used in (b) 
line(2;rx) for brevity. Clearly, a linearly interpolated nodal model 
with n =2 internal nodes can represent this function exactly. 

For each of the datasets we test models with 1 internal node 
up to 7 internal nodes (i.e. 3 total nodes up to 9 total nodes or 2 
line segments up to 8 line segments), using PolyChord (Handley 
et al. 2015) to calculate evidences (the vanilla method henceforth) 
and again using PolyChord to implement the new method (Post(n) 
method henceforth) 3 . PolyChord is a relatively new nested sam¬ 
pler and was found to be very suitable for this problem. We use 


where A_, X + are the end points of the uniform region in which the 
data points may be found a priori. A Bayesian derivation of this 
likelihood can be found in Appendix A; for more detail see Sivia & 
Skilling (2006). The integral is calculated numerically using stan¬ 
dard quadrature techniques. 

Given the data, the Bayesian approach is to use this likelihood 
to infer the probability distribution of the parameters in some para¬ 
metric form of the function /. We will do this for the family of 


2 50 points were drawn initially for each dataset, but some fellow outside 
the prior range due to the Gaussian noise, and were not included. 

3 Note the marginalised posterior probability on n is calculated from the 
chain_unnonnaUsed.txt file using the standard nested sampling technique 
(Skilling 2006). It is important to use this file over the usual chain.txt file 
and set up PolyChord to output all inter-chain points of the algorithm. This 
ensures good reconstruction of Pr(n| T). /VI) over the lower probability re¬ 
gions in light of the computing Tog-sum-exp' problem. 
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(a) sin(27nr) 


(b) line(27nj 




Figure 3. Posterior odds ratios (or Bayes factors) for datasets (a) and (b) defined by Figure 2. denotes the Bayes factor for the models with n and n' 
internal nodes. Histograms represent posterior odds ratios with respect to the most probable model. White, light grey and dark grey bars are for the vanilla, 
Post («)25 and Post(«) 5 o results respectively. Error bars shown are sample standard deviations obtained from 10 repeat trials. The posterior odds ratios agree 
well between methods. 


(a) sin( 27 r.v) 


(b) line(27rjr) 




Figure 4. Average timing data for datasets defined in Figure 2 and the vanilla, Post(n )25 and PostpOso results defined in the text. The shaded regions show 
the approximate number of likelihood calculations made for each model n and the solid lines show the cumulative numbers. More detail and an analysis of 
the timing benefits of using our new method are given in appendix B. Considering error bars on the posterior odds ratios for the different methods, it is clear 
that the PosLtnJso method (darkest plots) can produce comparable accuracy in less likelihood calls than the vanilla method (lightest plot). 


uniform priors on the y amplitudes of nodes, and sorted uniform 
priors on the x position parameters of nodes, where the x priors 
are uniform but forced to adhere to x, < x i+l to avoid the scenario 
where the n internal nodes are interchangeable with each other. We 
assign equal prior probabilities for each model, so PORs are equal 
to Bayes factors. 

Each dataset is analysed 10 times for each method to deter¬ 
mine the statistical uncertainty on the derived PORs. In each case 
the PORs are normalised to the model with the highest evidence 
in the vanilla method. Errors on the posterior odds ratios are given 
as the sample standard deviation from the 10 repeats. PolyChord 
was run with A live =25Adi m live points initially to obtain the results 
labelled Post(n) 2 5 , where N^ m =2n + 2 is the number of parame¬ 
ters to be explored (the dimension of the space) and the number of 
live points, N \ ive , is the only tuning parameter associated with the 
PolyChord sampling algorithm. To highlight accuracy and timing 
considerations when using the method, we also repeat the analysis 
with Afii ve =50A ? dim to obtain the results labelled Post(n) 50 . 


4.2 Results for nested nodal models 

The posterior odds ratios (or Bayes factors) for the vanilla method 
with Mive=25iVdim and Post(rc) method with Mive=25(Vdim and 
50A dim , per dataset, are shown in Figure 3 and show good agree¬ 
ment between the two methods regardless of N\ i ve . From this we 
conclude that the methods produce consistent posterior odds ratios. 
As one might expect, for the line(2;rx) dataset, the preferred model 


has n =2 internal nodes, whereas a larger number of nodes is pre¬ 
ferred for the sin(27rx) dataset. The reconstructions of the favoured 
models for each method are shown in Figures 5 and 6 respectively. 
The reconstructions are identical in all key features between meth¬ 
ods. The Post(n) 50 graph is not plotted as it was very similar. Fi¬ 
nally, the timing data in Figure 4 suggests that Post(n ) 2 5 results 
were faster to obtain by about a factor of 2.5 when using the same 
Aii V e P er parameter, however this comes at a cost in accuracy as the 
errors on the vanilla posterior odds ratios are clearly tighter than 
the Post(n )25 results. PostpOso, however, takes less time to produce 
similar accuracy for the significant posterior odds ratios. In gen¬ 
eral we observe that our method can produce Bayes factors faster 
than the vanilla method in a systematic manner, and discuss this in 
appendix B. 

The important discrepancies between the vanilla and Post(n) 
methods are in the errors on the posterior odds ratios, where we 
have identified 2 issues: firstly for large negative posterior odds 
ratios the errors from the Post(n) method are quite large and, sec¬ 
ondly, the errors on the vanilla method are tighter for equivalent 
Alive- The first discrepancy might be expected given that Poly¬ 
Chord, and nested samplers in general, rapidly converge to the 
central peak(s) in a distribution, thus spending less time in lower 
likelihood regions and sampling those regions proportionately less 
thoroughly. Given that each model investigated is a separate mode 
in the computation, a model with low likelihood will be less thor¬ 
oughly explored than the models with larger likelihoods - making 
the calculation of Pr(/i| 2), M) less reliable for these models. This 
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(a) sin(2/rjt) 



S=5 


2 cr 



(b) line(27rx) 



Figure 5. Reconstructions of y(;c) using the vanilla method of explicitly calculating evidences to obtain posterior odds ratios. Plots are from one of the 10 
trials, arbitrarily chosen, and are of the model with the largest posterior odds ratio, i.e. (a) 6 internal node model, (b) 2 internal node model. Each figure shows 
the posterior probability Pr(y|x, D, Al), in normalised slices of constant x to show the deviation from the peak y at each x, binned in 100 bins in both x and y. 
The colour bars to the right show the confidence intervals that the probabilities represent at a given slice in x as calculated from the inverse of the cumulative 
distribution function on Pr(y|x, !D, Ad), see Planck Collaboration et al. (2015) section 8.2 equation 68 for details. The lcr and 2<x intervals are plotted as black 
lines for clarity and the cube-helix colour scheme by Green (2011) is used for linearity in grey scale. In white is plotted the underlying function from which 
the data was sampled, and even with less than 50 datapoints a good reconstruction is obtained. 


(a) sin(27rx) 





2 cr 



(b) line(27rjc) 



Figure 6. Reconstructions of y(x) for the Post(n )25 results to obtain posterior odds ratios. Plots are for comparison to the vanilla results of Figure 5, and are 
plotted in the same way. The Post («)25 results agree well with the vanilla method results in all key features. 


is, however, desirable behaviour. Spending compute time only on 
probable models reduces the overall time taken to find the most 
probable model(s), whilst the less probable models are still sam¬ 
pled sufficiently well to identify them as less probable. 

The second discrepancy is more significant but equally pre¬ 
dictable. The number of live points in PolyChord defines how fully 
the space is explored. For the vanilla method, the (Vij v e=25./V dim 
calculation provides adequate sampling per model, whilst for the 
Post(n) method a similar number of live points needs to explore 
several models simultaneously, effectively reducing the live points 
available to explore each model and producing larger errors. This 
suggests that users need to ensure that algorithm tuning parameters 
such as /V llve are chosen appropriately and check that the results 
on repetitions of the algorithm are consistent. The Post(rc) 5 o results 
demonstrate clearly that results are confidently extracted in compa¬ 
rable compute-times when best practice is adhered to. Being aware 
of the increased modality of the space that is inherent to the method 
and ensuring that the sampling algorithm adequately handles such 
complex parameter spaces helps ensure accurate results. 

Finally, it is worth making some brief comments on the ‘phys¬ 
ical’ results of the model selection process for each of the datasets. 
In dataset (a) a more complex underlying shape in y(x) is identi¬ 
fied needing more nodes than dataset (b), consistent with the dis¬ 
tinction between sin(2;rx) and line(2;rx). It should be noted too 
that over-fitting (adding more parameters than needed) is not heav¬ 
ily penalised for dataset (b), as observed in the slow decrease in 


Bayes factors after the favoured model is found - this is standard 
behaviour (Sivia & Skilling 2006, p. 93) and can be understood by 
considering the Occam factor associated with a parameter which is 
constrained without increasing the fit of the model (MacKay 2003, 
p. 349). In general the model selection and nodal reconstruction 
technique produces strong conclusions on the shapes of the y(x) 
plane, given the data in each case, and clearly identifies the inher¬ 
ent complexity of the various datasets, as we desired it to. 


4.3 Results for non-nested models 

Our new method does not require that the models be nested. A 
model is nested inside another ‘larger’ model if setting some pa¬ 
rameters to specific values in the larger model allows one to obtain 
the smaller nested model. The nodal reconstructions are clearly 
nested in this sense. Here we quickly demonstrate that our method 
also works for non-nested models. 

We test datasets (a) and (b) against two models. The first 
model is the sinusoid function y(x)=A s'm(2nBx + C) + D and the 
second model is the 2 internal node reconstruction, so that we ex¬ 
pect dataset (a) to favour the sinusoidal model and (b) to favour 
the linear model. Parameters A and B are scale parameters for the 
amplitude and frequency respectively; we assign to these logarith¬ 
mic priors in the range [0.1,5]. Parameters C and D are shift pa¬ 
rameters and we assign uniform priors in the ranges \—n,i r] and 
[—1.5,1.5] respectively. These priors reflect sufficient coverage of 
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the prior space defined in Figure 2 and are adequate for comparing 
the vanilla and new methods. It is important to note that in this test, 
both the vanilla method and Post(n) method used N\„ e =25Na m . 
For the vanilla method this resulted in A^ii ve =100 for the sinusoidal 
model and Mive=150 for the 4 node model, whilst for the Post(n) 
method the parameters were searched simultaneously (along with 
n) to give 11 parameters and /V|, vc =275. 

The posterior odds ratios for dataset (a) favour the sinusoid by 
1.94 ± 0.93 and 2.01 ± 1.08 units, for vanilla and Post(n) methods 
respectively. The posterior odds ratios for dataset (b) favour the 
linear model by 13.82 ± 1.02 and 14.87 ± 2.58 units, respectively 
for vanilla and Post(u) methods. Taking into account the previous 
discussion, it is clear that the new method produces posterior odds 
ratios consistent with the vanilla method. The Post(n) method here 
was about 5 per cent slower for dataset (a) and 30 per cent slower 
for dataset (b). However, with the significantly larger number of 
live points that the Post(rc) method used, the fact that the meth¬ 
ods are of comparable time is a desirable result and suggests that 
the unconstrained parameters for a given n are not significantly in¬ 
creasing the compute time of those isolated nodes in the parameter 
space. 

In general we conclude that the discussions in section 3 re¬ 
garding unconstrained parameters is correct. When parameters 
were reviewed for the chains files produced in a given model, the 
parameters that were not used by that model were distributed ac¬ 
cording to their priors. This is one of the core strengths and nov¬ 
elties of the method and allows posterior odds ratios to be calcu¬ 
lated without constraints on the models to be compared. This veri¬ 
fies that the method works for non-nested models, and we proceed 
now to apply it to a cosmological application using the nodal re¬ 
construction. 


5 APPLICATIONS TO THE DARK ENERGY EQUATION 
OF STATE 

Having validated our approach on a toy problem, we now apply 
our method to a cosmological application, for which the vanilla 
method is not computationally suited. The aim is to demonstrate 
the method in a typical model selection application to obtain pos¬ 
terior odds ratios efficiently and with estimates of the error that do 
not require excessive repetition of long computations. We probe 
the dark energy (DE) equation of state parameter w(z) as a func¬ 
tion of redshift to update the work of Vazquez et al. (2012b), using 
more modern datasets. We further showcase the usefulness of the 
nodal reconstruction approach, briefly described in section 4 and 
more fully in Vazquez et al. (2012b), in defining the complexity 
supported by the data and identifying features in w(z), adding to 
the list of papers using the reconstruction (Vazquez et al. 2012a,b; 
Aslanyan et al. 2014; Planck Collaboration et al. 2015). 

5.1 Method 

We combine CMB data from the Planck 2013 data release (Planck 
Collaboration et al. 2014a,b,c) with the WMAP 9-year polarisation 
data (Bennett et al. 2013), Baryonic Acoustic Oscillation (BAO) 
from the BOSS data release 11 (Anderson et al. 2014) and super¬ 
novae type la (SNIa) data from the Union 2.1 catalogue (Suzuki 
et al. 2012) to provide constraints on DE behaviour. We focus on 
the redshift range z £ [0,2] in the reconstruction, where we set to 
constant values w(z)=w(2) when z > 2. We use the CosmoMC code 
package (Lewis & Bridle 2002), which contains the camb code 


(Lewis et al. 2000; Howlett et al. 2012), and substitute the MCMC 
sampler for the MultiNest nested sampling plugin running in con¬ 
stant efficiency mode (Feroz & Hobson 2008; Feroz et al. 2009; 
Feroz et al. 2013), which is a well established nested sampling im¬ 
plementation for evidence calculations and parameter estimation, 
and was the sampler used by Vazquez et al. (2012a,b) thereby en¬ 
abling a direct comparison. To facilitate deviations away from the 
standard ACDM equation of state parameter w =-1 we implement 
the ‘Parameterized Post-Friedmann' framework (PPF) modifica¬ 
tion to camb (Fang et al. 2008). For further details on the method 
and datasets see Vazquez et al. (2012b) and Planck Collaboration 
et al. (2014b) respectively. 

Using posterior odds ratios to identify the optimal number of 
nodes tells us the complexity of w(z) features supported by the 
data. Further, the nodal reconstruction, as shown in the toy model, 
is highly adept at identifying constraints in the (w, z)-plane. Of par¬ 
ticular interest is whether deviations in w(z) away from the success¬ 
ful ACDM cosmological model are supported by modern data and 
to identify which DE extensions are favoured. Theories incorpo¬ 
rating deviations from w=-l include quintessence scalar fields for 
w > -1 (Ratra & Peebles 1988; Caldwell et al. 1998; Tsujikawa 
2013) and phantom DE models with super-negative w < -1 (Cald¬ 
well 2002; Sahni 2005). The possibility of crossing of the phantom 
divide line at w=— 1 in dynamical models has also been considered 
(Zhang 2009). Modified gravity or brane-world models also make 
predictions about w(z) (Sahni 2005). Thus, paramount to under¬ 
standing DE is determining w(z). 

To do this we compare 6 models, in order of increasing com¬ 
plexity: ACDM with w=—l, wCDM with w constant in z but al¬ 
lowed to vary in amplitude, tiltC DM with w(z=0) and w(z=2) al¬ 
lowed to vary and linear interpolation for w(z) between them (0 
internal node model), and then nodal models with 1, 2 and 3 in¬ 
ternal nodes respectively. Models are abbreviated to A, w, 1, 1,2 
and 3 respectively, where appropriate. Priors on each w parame¬ 
ter are uniform on the range [-2,0] and were chosen to be con¬ 
servative, we did not check the robustness of results with respect 
to prior choice and leave this for future work, see Vazquez et al. 
(2012b) for such an analysis. Priors on each z parameter are uni¬ 
form on [0,2] such that for more than one internal node Zi < Z;+i 
(i.e. sorted uniform priors as in the toy model). The previous work 
by Vazquez et al. (2012b) found that ACDM was favoured, whilst 
the 2 internal node model had the second largest evidence, pointing 
to structure in w(z) that could not be captured by a constant equa¬ 
tion of state parameter wCDM, or even the 1 internal node model. 
Here we show clearly that Planck 2013 era datasets do not have 
this feature and only ACDM can be considered favoured. 

An important point is that the Planck data require the addi¬ 
tion of 14 so called nuisance parameters. These must be sampled 
and, together with the 6 parameters of CDM models, produce an at 
least 20 dimensional parameter space. As MultiNest is a rejection 
nested sampling algorithm, it is expected that computation times 
increase significantly in higher dimensions as the volume on the 
shell increases 4 . MultiNest has the algorithm search parameters 
Vii V e an d eff, where decreasing eff (in constant efficiency mode) 
typically achieves more accurate results more effectively than in¬ 
creasing (V live . 


4 Specifically, it constructs multi-dimensional ellipsoids to estimate sam¬ 
pling within an iso-likelihood region, as required by nested sampling. The 
ellipsoids expand by a fraction to ensure no viable regions of the true iso¬ 
likelihood contour are outside this estimate. Points are sampled inside these 
ellipsoids and rejected until meeting the nested sampling criterion. 
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Parameter Prior Range Prior Type 


Q. b h 2 

[0.019,0.025] 

Uniform 

Cl c h 2 

[0.095,0.145] 

Uniform 

100 Bmc 

[1.03, 1.05] 

Uniform 

T 

[0.01.0.4] 

Uniform 

n s 

[0.885,1.04] 

Uniform 

ln(10 10 A s ) 

[2.5,3.7] 

Uniform 

A PS 

•^100 

[0,360] 

Uniform 

A PS 

71 143 

[0,270] 

Uniform 

A PS 

*211 

[0,450] 

Uniform 

a cib 

71 143 

[0,20] 

Uniform 

a cib 

*-211 

[0,80] 

Uniform 

AfSZ 

71 143 

[0,10] 

Uniform 

r PS 

' 143x217 

[0,1] 

Uniform 

yCIB 
' 143x217 

[0,1] 

Uniform 

yCIB 

[-2,2] 

Uniform 

c 100 

[0.98, 1.02] 

Uniform 

^217 

[0.95, 1.05] 

Uniform 

gtSZ-CIB 

[0,1] 

Uniform 

j^kSZ 

[0,10] 

Uniform 

P\ 

[-20,20] 

Uniform 

w(Z;)|i=1...5 

[-2, -0.01] 

Uniform 

Zili=2..,4 

[0.01,2.0] 

Sorted-uniform 

n 

[A, h’, t, 1,2,3] 

Uniform 

^uniform 

[-2, -0.01] 

Uniform 


Table 2. The 30 priors that define the parameter space. The top set of pa¬ 
rameters are the CDM parameters, the middle ones show the nuisance pa¬ 
rameters associated with the Planck 2013 data release, and the bottom set 
are the parameters introduced by dark energy model extensions, including 
n for selecting between models and fAnn foi-ni for testing a MultiNest edge- 
effect problem. Planck Collaboration et al. (2014b) has more details about 
the CDM and nuisance parameters, whilst the dark energy extension pa¬ 
rameters are defined in the text. 


With the new method there seems to be no way to estimate the 
errors on the posterior odds ratios from a single run, and attaining 
these is best done via repeat simulation and the calculation of sam¬ 
ple standard deviations from these. We therefore performed 3 repe¬ 
titions each using Ai live =500 with eff=0.01 (the repeat runs) and the 
default July 2014 CosmoMC priors for the 20 CDM and nuisance 
parameters and the priors mentioned above for additional model 
parameters; an overview is shown in Table 2. Constant efficiency 
mode had to be used to attain feasible computing times, similarly 
the search parameters could not just be increased arbitrarily. With 
these MultiNest search parameters and constant efficiency mode, 
it was found that the edges of the priors were not sampled effec¬ 
tively. The error is reproducible with a 20-dimensional Gaussian 
test likelihood with a covariance matrix given by Planck chains. To 
ensure this problem had no impact on our results, firstly we added 
a prior for an unconstrained parameter, the 0 lm jf oml parameter in Ta¬ 
ble 2, which should produce a flat posterior. Observing the edge 
effects problem on this parameter gives a clear indication of the 
severity of the problem, and allows us to reconsider parameter esti¬ 
mation conclusions if needed. Secondly we tested for convergence 
of the marginalised posterior on n with respect to search parameter 
changes to ensure that our parameter estimation results were ro¬ 
bust. We thus performed a single further run using MultiNest with 



Figure 7. The posterior odds ratios obtained from the new method compar¬ 
ing the 5 DE extension models to ACDM. The error bars on each histogram 
are the sample standard deviations of the 3 repeat runs. It is clear that the 
2 sets of results agree very well, with discrepancies between them small 
compared both to the error bars and the absolute values used to draw con¬ 
clusions based on Jeffreys guideline. This shows that the results are robust 
with respect to changes in MultiNest search parameters, as required. Nu¬ 
merical results are given in Table 3. 


Bayes Factor 

Full Run 

Repeat Averages 

S Aw 

-2.41 ±0.03 

-2.55 ± 0.03 

Sa, 

-3.26 ±0.11 

-3.43 ±0.11 

Sai 

-3.54 ±0.32 

-3.97 ± 0.32 

Sa2 

-3.89 ±0.40 

-4.50 ± 0.40 

Sa3 

-4.31 ±0.63 

-4.94 ± 0.63 


Table 3. Summary of the Bayes factors from the 4 computations. The full 
run and repeat averages columns show results using the MultiNest search 
parameters discussed in the text. For both columns, the errors are sample 
standard deviations of the 3 repeat trials. The results agree well within lcr 
confidence intervals for all but the where a larger discrepancy occurs 
due to small error bars despite a small difference in log-units. The results 
show clearly that the new method implementation is robust to changes in 
MultiNest parameters. 


the search parameters Af/, ve =1000, eff=0.005 (full run) for which 
the edges of the prior were sampled effectively. Given the con¬ 
cerns about the accuracy of the MultiNest evidence calculation for 
Planck data (due to nuisance parameters, high dimensionality, and 
the need for constant efficiency mode), the new method combined 
with the 2 robustness checks thus provides a valuable alternative 
way to obtain posterior odds ratios. 

5.2 Results 

The posterior odds ratio results for the full run and the 3 repeat runs 
are shown in Figure 7 and Table 3. The key points are firstly that the 
posterior odds ratios are consistent with each other, demonstrating 
convergence of Pr(/?| D, M ) with respect to MultiNest search pa¬ 
rameters, and secondly that the w{z) investigation clearly favours 
ACDM. 

The toy model showed that error bars on posterior odds ratios 
will depend on how thoroughly the sampling explores the space. 
Note that the error bars used are the sample standard deviations 
from the posterior odds ratios of the 3 repeat runs. The repeat run 
posterior odds ratios are consistent with the full run and sufficiently 
tight to resolve differences to make conclusions based on Jeffreys 
guideline, suggesting that the space is well explored. This conver¬ 
gence on reruns, together with the convergence between different 
MultiNest search parameters, suggests that the posterior odds ratio 
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Figure 8. The w(z) priors, w(z) reconstructions and parameter constraints for each of the 5 model extensions beyond ACDM. The leftmost plot is the prior 
space on the function w(z) as a result of our flat priors on amplitude and position parameters and the central plots show the posterior on w(z) defining the data 
and model constraints on the w(z)-plane. These plots show the posterior probability Pr(w|z) similar to Figure 5. Here it is the probability of w as normalised 
in each slice of constant z , with colour scale in confidence interval values shown. The lcr and 2cr confidence intervals are plotted as black lines. Note that the 
prior on w(z) in wCDM does not appear flat in this plotting style despite being so. Comparing the priors of the other 4 reconstructions to the flat wCDM prior 
it is noticed that the priors on w(z) are slightly favouring the central values closer to w=-l as expected when calculating priors analytically. The posteriors 
show that the data constrains w'(z) strongly compared to our priors. Rightmost are the ID and 2D marginalised posteriors of the additional model parameters. 
Plots were produced using GetDist and with the cubehelix colour scheme by Green (201 1) for linearity in grey scale. 


results are robust. Additionally, the edge effect problem previously 
mentioned was thoroughly checked for using an unconstrained pa¬ 
rameter 0 u „if orm . The posterior of ^uniform was close to flat for all 
runs. The edge effect problem presumably affects all parameters 
a small amount, as the strength of this effect is different between 
the different MultlNest search parameter settings whilst the pos¬ 


terior odds ratios are consistent, it suggests that the posterior odds 
ratios are not significantly biased. From these 4 runs we therefore 
conclude that we have accurate posterior odds ratios and proceed 
to quote those of the full run combined with the errors from the 3 
repeat runs as upper estimates for those of the full run (as repeats 
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Figure 9. Summarising the DE model extension results for the constraints on the w(z) plane. The 5 extension models, excluding ACDM, are weighted by 
their evidences to give a model averaged plane reconstruction (Parkinson & Liddle 2013; Planck Collaboration et al. 2015), and plotted as in Figure 8. When 
including ACDM, approximately 85 per cent of the central confidence interval region is contained in the line w=~ 1 due to the strength with which ACDM is 
favoured by the posterior odds ratios, almost 2<x. The two plots show the prior space (left) contracting down to the posterior odds ratio averaged wfr) plane 
reconstruction (right), as discussed in the text. It is clear that ACDM is well within the favourable region, with the 1 cr contours easily containing w=- 1. 


of a more well sampled run will produce tighter estimates, shown 
in the toy model when doubling Aii ve ). 

From these posterior odds ratios it is clear that ACDM is the 
only favourable model. The decrease in posterior odds ratios with 
an increase in the number of parameters to model DE suggests that 
further additions of parameters to model deviations from ACDM 
are penalised more strongly by the Occam’s Razor principle than 
the gain in constraining power that they provide. One can estimate 
the Occam factor associated with adding an additional nodal ampli¬ 
tude parameter, using the analysis in (MacKay 2003, page 349), as 
<t w \ D /<t w , where is the width around the peak of a Laplace ap¬ 
proximation inside the evidence integral and cr w is the prior width. 
We estimating cr n .\ D lcr„, for non-Gaussian parameters with a full 
width half max (FWHM) calculation of the ID marginalised w- 
amplitude posterior. Doing this for the wCDM model's additional 
parameter yields a drop in the Bayes factor due to the approxi¬ 
mated Occam factor of -2.63. The observed -2.41 ± 0.03 therefor 
suggests that the parameter is not improving the likelihood fit to 
the data significantly. Doing something similar for the 3 internal 
node model gives an Occam factor of -0.45 (using the average of 
the 5 amplitudes; assuming that an additional z-position parameter 
is unconstrained as there are no additional w(z ) features it would 
constrain). This is the anticipated decay in the posterior odds ratio 
when adding unnecessary nodes, and the Bayes factor drop from 
2CDM to 3CDM at -0.42 suggests that 3 nodes already saturate 
the w(z) space. 

A clear and strong conclusion from this analysis is that there 
is considerably less evidence for deviations from ACDM in the 
Planck era datasets used here than in the WMAP era datasets used 
by Vazquez et al. (2012b), which is consistent with other results 
(Planck Collaboration et al. 2014b; Shafer & Fluterer 2014). The 
next most favoured model is the next simplest one. wCDM, and 
at a posterior odds ratios of -2.41 ± 0.03 it is almost significantly 
disfavoured according to the Jeffreys guideline. All other models 
are significantly disfavoured at between 3.3 to 4.3 log units. 

The constraints in the (w, z)-plane for each of the model ex¬ 
tensions beyond ACDM, shown in Figure 8, do however indicate 
some deviations from w=— 1. Typically the data seem to favour the 
phantom region, potentially more so at the ends of the considered 
redshift range and less so at redshift 0.4-0.7, where the data gives 
the tightest constraints. Flowever, the 1 cr and 2 cr contours clearly 
indicate that these effects are not significant. At all z and for all 
models, w=-l is comfortably within the peak of the Pr(w|z) dis¬ 
tribution and more so in the regions where we have strong data 


constraints, suggesting that any deviations or apparent systematic 
patterns are dominated by a lack of data. The plane reconstructions 
also support the model selection conclusions that ACDM is signif¬ 
icantly favoured over other models, as the constraints in the data 
do not deviate from w=-l beyond even lcr. 


The correct Bayesian way to view the w(z) plane reconstruc¬ 
tions for all models considered is to sum over all the models 
whilst weighting by the Bayesian evidence, or equivalently pos¬ 
terior odds ratios. This is exceptionally easy to implement with 
our new method, as a program like GetDist (included with Cos- 
moMC) can use the chains file produced by the new method to 
correctly weight all the models automatically whilst marginalis¬ 
ing out the parameter n. Figure 9 shows this for the 5 DE exten¬ 
sion models beyond ACDM. When plotting with ACDM the plot 
is centered on w=—1, with 85 per cent of the peak confidence in¬ 
terval region contained in the w=-l line, and thus a plot show¬ 
ing only the model extensions is more insightful. The plane re¬ 
construction shows clearly the constraining power of the data at 
different redshifts as our knowledge of w(z) moves from the prior 
on the left to the posterior on the right. The result is a tightly con¬ 
strained function of w(z) slightly below -1 for all redshifts, sug¬ 
gesting a small favouring of the phantom region at an insignificant 
level. Most importantly, ACDM is fully compatible, well within lcr 
of the model extension results, as is expected given the Bayesian 
model selection analysis. This insignificant deviation away from 
w——1 explains clearly why ACDM is so heavily favoured. 


Of practical importance is the strength with which the nodal 
reconstruction identifies features, and especially that the recon¬ 
struction is data driven. Most of our datasets that can constrain 
w(z) are in the redshift range z s [0.5,0.8] and this is shown by 
where the reconstructions most tightly constrain the plane. This 
reconstruction technique is clearly of merit and in the future, with 
more powerful datasets, can hopefully act as a tool to identify fea¬ 
tures (if any) in w(z). At present, the work here can only suggest 
that dark energy models with w(z) close to -1 are needed. Finally, 
the posteriors of the CDM parameters are plotted in Figure 10 for 
each of the 6 models tested. The posteriors of the DE extensions 
agree well with the ACDM values, as can be expected given that 
there is no significant deviation from w=- 1. 
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6 CONCLUSIONS 

We demonstrated a novel method for calculating posterior odds 
ratios through a toy model application and then applied it to a cos¬ 
mological model selection problem. 

Our new method uses Bayesian parameter estimation on a pa¬ 
rameter that switches between models, via a hyper-likelihood that 
wraps around the individual model likelihoods, to infer posterior 
odds ratios (or Bayes factors if desired) without calculating evi¬ 
dences. It uses novel partitioning of the parameter space via the 
parameter n, and marginalisation of posterior probabilities, to al¬ 
low sampling of a variable length parameter space when moving 
between models, thus facilitating any models to be tested without 
restriction and without reversible jump Monte Carlo techniques. 
To use the method one needs to have a parameter estimation algo¬ 
rithm capable of sampling from multi-modal spaces and to decide 
which models one wants to test a priori. 

The toy model demonstrated clearly that the method is valid 
and consistent with the existing method of calculating posterior 
odds ratios by evaluating evidences. We conclude that the new 
method is not necessarily faster, despite avoiding evidence inte¬ 
grals, for 2 reasons. Firstly, to get errors on the posterior odds ra¬ 
tios it requires rerunning several times, whereas nested sampling 
algorithms such as MultiNest and PolyChord can attain error es¬ 
timates of evidences from a single run. Secondly, the parameter 
space needs to be explored comparably thoroughly in both meth¬ 
ods, as shown by the increase in error bars on the posterior odds 
ratios in the toy model when spending less computational time on 
the new method. 

A peculiar feature of the new method in combination with 
nested sampling (which likely applies to other samplers too) is 
that computation time dedicated to a model is dependent on how 
strongly the model is favoured over others. Less favoured models 
become depopulated with live points as the nested sampling algo¬ 
rithm removes lowest likelihood points. As a result, we observed 
that less favoured models typically had less accurate posterior odds 
ratio calculations, which helps to reduce computing time, but still 
in such a way that they were always identifiable as less favoured. 
The reduction in computing time can be substantial, especially in 
applications where there are a number of computationally expen¬ 
sive models with low posterior odds ratios. 

The toy models illuminated precautionary measures that best 
be adhered to by users. As with all Bayesian parameter estima¬ 
tion, robustness of posterior probabilities to changes in algorithm- 
specific tuning parameters needs to be tested for and in the case 
of the new method, where a posterior is used to infer evidence ra¬ 
tios, it is especially important to check this. It is best to test that 
the posterior odds ratios obtained from the posterior on n are con¬ 
sistent on repetitions of the algorithm and also that the error bars 
attained from repetitions are sufficiently small if needing to make 
judgments based on Jeffreys guideline. The toy model also high¬ 
lighted the strength of the nodal reconstruction in identifying fea¬ 
tures in y(x) plane reconstruction problems. We conclude that it is a 
useful tool for analysing the complexity supported by the data and 
add to the volume of literature using it (Vazquez et al. 2012a,b; 
Aslanyan et al. 2014; Planck Collaboration et al. 2015). 

Thereafter, taking the above considerations into account, the 
new method was used to attain posterior odds ratios in a cosmo¬ 
logical context where direct evaluation of evidences can be com¬ 
putationally demanding and problematic. We applied the nodal 
reconstruction technique to reconstruct the dark energy redshift- 
dependent equation of state parameter w(z), analysing the dynamic 
behaviour supported by modem datasets in a search for deviations 


from the ACDM model (w=-l). This was principally an update 
on a paper using WMAP era data by Vazquez et al. (2012b). We 
concluded that ACDM is significantly favoured above any nodal 
reconstruction applied. Additionally, the model allowing w to vary 
as a constant is almost significantly disfavoured at -2.41±0.03 log- 
units of the posterior odds ratio with respect to ACDM. We con¬ 
clude that additional parameters are systematically disfavoured: in¬ 
creasing the complexity of the w(z) reconstruction decreases poste¬ 
rior odds ratios with respect to ACDM. The Occam’s Razor effect 
penalises additional parameters when using posterior odds ratios 
to do model selection and, as ACDM is an excellent fit to current 
cosmological data, the addition of parameters to extended beyond 
ACDM adds less to the constraining power of the models than the 
Occam’s factor penalises. 

The robustness of the results and methods were confirmed in 
several ways. Figure 10 shows that the CDM parameters of each of 
the dark energy extension models agree well with the ACDM val¬ 
ues, as is expected given that all models agree well with w=- 1. 
Further, a potential problem in sampling the edges of priors in 
high-dimensions was identified with MultiNest when using con¬ 
stant efficiency mode, but through tracking an unconstrained pa¬ 
rameter ^uniform> it was shown to be insignificant given the final 
search parameters used. General robustness of the new method was 
confirmed too by repeating the calculation of Pr(n| D, AO with dif¬ 
ferent search parameters and showing that the value of Pr(/z| D , Al) 
had converged with respect to algorithm tuning parameter. 

Finally, the cosmological application demonstrated the 
strength of the new method, attaining posterior odds ratios without 
needing evidence calculations and effectively dealing with param¬ 
eter spaces of varying length. Errors on the posterior odds ratios 
were attained through repeat runs with a faster sampling parameter 
setup which doubled to confirm that the posterior odds ratios were 
converged and accurate. As such a robustness check is important 
for any parameter estimation or model selection problem, where an 
algorithm uses tuning parameters for the sampling, this approach 
should come at little extra cost in practice. 
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Figure 10. The CDM parameter ID marginalised posteriors for each of the 
6 models tested. As MultiNest converges to the peak likelihood regions, 
the datapoints output to the chains file are more sparse for some of the 
models. Typically ACDM had 8 times more points than wCDM with which 
to accurately reconstruct these posteriors. The lower posterior odds ratio 
models had less still and this leads to a lower quality reconstruction for the 
less favoured models. Nevertheless, it is clear that the models agree well 
and there are no significant deviations from the ACDM values of the CDM 
parameters, as can be expected given the only slight deviation from w=— 1 
in each model. 
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APPENDIX A: LINE FITTING LIKELIHOOD 

We aim to fit a parametric function y=f(x ) to a set of y' max data 
points [Xj,yj}, where we have some knowledge of the errors on 
these measurements {cr*., cr y .) ({_/'= 1, • ■ • , _/ max )). In order to fit the 
function, one needs to calculate the likelihood of observing the data 
[Xj, yj], given the function /, the observed errors and any additional 
assumptions we must make /: 

mxj,yjm<r X} ,<r yj },f,I). (Al) 

To model the “error bars”, we assume that each of the data 
points (xj, yj) is drawn from a separable Gaussian distribution with 
covariance diag(crj., cr y .). The distribution will be centered about 
some true value (Xj, Yj), where these values are unknown and will 
need to be marginalised over as nuisance parameters in the final 
calculation. If each of these distributions are independent from 
each other, we arrive at the likelihood: 

Yi({ Xj ,yj}\{X j ,Yj\,{o- xr CT yj }) = 


7 max i 

1~I 1 

l 

i 

1 

U CXP 

2(T 2 

X J 

2cr yj 


To marginalise out the nuisance parameters, we place our 
prior assumptions on them. We shall assume that the true Xj values 
are drawn uniformly in some range X_ < Xj < X + , and we shall as¬ 
sume that the true Yj obey the functional relationship: Yj = f(Xj). 
Given this, the probability distribution is: 


mXj,Yj}\f,X_,X + ) 


J max r -. 

V n s[Yj-AXj)\ 

j=i 


:X_<Xj<X + 

: otherwise 


(A3) 
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where 6 is the Dirac 5-function. Multiplying (A2) and (A3) to¬ 
gether and marginalising out {Xj, Yj] by integrating yields the like¬ 
lihood: 


Pr ({xj,yj}\{o- xr ir yj },f,X.,X + ) = 



exp 


2o-t . 


(yj-XXj))- 
2 or. 


(A4) 


This procedure may be straightforwardly extended to consider cor¬ 
related error bars where the covariance matrix of (A2) is no longer 
diagonal. One may also adjust (A3) if some additional knowledge 
is known about the independent variables Xj. For further details the 
reader is referred to Sivia & Skilling (2006). 


APPENDIX B: EFFICIENT COMPUTING OF BAYES 
FACTORS 

Using the datapoints in figure B1 to test the vanilla and Post(n) 
methods we demonstrate that our new method may outperform 
the evidences approach in a systematic fashion that makes the 
approach desirable for common astrophysical and cosmological 
problems. 

Running the nodal reconstruction technique with models of 
1 internal node up to 13 internal nodes (3 to 15 total nodes) we 
obtain Bayes factors and timing results shown in figure B2. The 
timing data shows the number of posterior points, and thus likeli¬ 
hood calculations up to a factor of the PolyChord efficiency, that 
each method makes for each of the nodal reconstruction models 
(shaded plots), alongside the cumulative number of likelihood 
calculations of these models (line plots). 

Using the vanilla method, completing the evidence calcula¬ 
tion for each model means that adding increasingly complex 
models is increasingly computationally expensive. In the Post(n) 
method, however, the model space is rapidly traversed from 
lower likelihood regions to higher likelihood regions, so that 
computationally expensive models with low likelihoods (or more 
correctly, with lower Bayes factors compared to other models in 
the space) are explored rapidly by the nested sampling algorithm. 
This is clearly identified by the fact that the Bayes factors and 
the number of likelihood calculations peak at the same model (4 
internal nodes) and tail off similarly for models on either side of 
this. 

It is worth noting, however, that the Post(n) method performs more 
likelihood calculations for the most probable models, because the 
additional overhead of setting up the other parameters and populat¬ 
ing their dimensions with live points (because throughout we use 
that Aii ve ^ Adim) means that the algorithm progresses more slowly. 

Astrophysical and cosmological problems where a number 
of models of increasing complexity are explored may therefor 
benefit from using this method. It is not guaranteed, however, 
as with the vanilla case one may have identified a drop off in 
the Bayes factors beyond n = 8 and stopped testing the more 
complex models thereafter. Nonetheless, the Post(zi) method could 
provide an efficient means of verifying the drop off (for example 
one might run the above with 7t(n)=[4,9,10,11,12,13] as a fast 
means of verifying the shape). Any gains in performance must 
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Figure Bl. A set of 11 datapoints defining a spike in the x-y plane. We test 
this dataset with models of 1 internal node up to 13 internal nodes (3 to 15 
total nodes). 




Figure B2. Bayes factors with respect to the most probable model (top) 
and timing data (bottom) for the vanilla method and the Post(n) method 
using 25A dim and 50A d j m number of live points. Note that the large er¬ 
ror bars on the dataset in figure B1 allow models that underfit with less 
than 3 internal nodes (1 at each vertex of the spike signal) to be probable. 
The timing data is measured by the number of likelihood calculations the 
algorithm makes. The shaded regions show the time taken on each nodal- 
reconstruction model for the vanilla (lightest colour plotted). Post(n) 25 , and 
Post(«) 5 o (darkest colour plotted) methods. Observe that the shapes of the 
Post(n) method timing data coincides with those of the Bayes factors, as 
explained in the text, and thus outperforms the vanilla method in obtaining 
Bayes factors accurately. 


be considered against the need for repetition of the algorithm to 
obtain an estimate of the error on the Bayes factors. 


MNRAS 000, 1-13 (2015) 


























